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Spatial Multiresolution Cluster Detection Method 
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A novel multi-resolution cluster detection (MCD) method 
is proposed to identify irregularly shaped clusters in space. 
Multi-scale test statistic on a single cell is derived based 
on likelihood ratio statistic for Bernoulli sequence, Poisson 
sequence and Normal sequence. A neighborhood variabil- 
ity measure is defined to select the optimal test threshold. 
The MCD method is compared with single scale testing 
methods controlling for false discovery rate and the spa- 
tial scan statistics using simulation and f-MRI data. The 
MCD method is shown to be more effective for discover- 
ing irregularly shaped clusters, and the implementation of 
this method does not require heavy computation, making it 
suitable for cluster detection for large spatial data. 

AMS 2000 subject classifications: Primary 60K35, 
60K35; secondary 60K35. 

Keywords and phrases: Multiresolution, Scan Statistics, 
Scale Space Inference, Spatial Test. 



1. INTRODUCTION 

Spatial cluster detection is an important problem in many 
different fields, such as epidemiology and image analysis. In 
this paper, we focus on the following spatial cluster detec- 
tion problem: to detect an (irregular) spatial region which 
has a different signal intensity compared to the background. 
Specifically, we focus on a large spatial region, e.g. an im- 
age or a geographical map. We assume that when there is 
no signal in it, the data collected at all locations are inde- 
pendent and follow the same distribution with an unknown 
parameter. Under the alternative hypothesis, within an (un- 
known) sub-region, the data observed are from a different 
distribution. Let T> be the entire spatial region, s £ T> be 
the location in 2?, and Y(s) be the observed data at location 
s. Let T>_a be a subregion in T>, i.e., T>a C T>. We assume 

that Y(s) ~ /(6» ) under H Q . Under Hi, Y(s) ~ /(6>i), 

when s € T>x, and Y(s) ~ /(#o) 5 when s g V/Vj,. Typical 
examples include 

1. Binomial distribution, f(8o) = Bin(N,po), and f(9i) = 
Bin(N,pi), with pi > po- 
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2. Poisson distribution, f(0 Q ) 
P(Ai), with Ai > A . 

3. Normal distribution, f(9o) 
Af(/xi,cr 2 ), with > fi . 



= P(A ), and f(9i) 
= N{n^a 2 ) and f{6i) 



Our main objective is to identify the unknown region T>a- 
Here we do not have any assumptions on the shapes of T> 
and T>a- 

One popular method to identify such spatial cluster 
is the spatial scan statistics. Scan statistics was first 
developed in Naus (1965a,b). See comprehensive review 
in Glaz and Balakrishnan (1999); Glaz et al. (2001, 2009). 
Kulldorff extended scan statistics into multidimensional 
case which includes spatial problems (Kulldorff, 1999). See 
also in Costa and Kulldorff (2009); Kulldorff ct al. (2006); 
Loh et al. (2008); Zhang and Lin (2009). Kulldorff also de- 
veloped a software to apply spatial scan statistics to spa- 
tial temporal data (Kulldorff, 2010). The basic idea of 
scan statistics is to use scan windows of different sizes 
and locations and perform likelihood ratio tests on all the 
scan windows using simulation. Typical window shapes in 
Kulldorff (2010) include circle or ellipse. Other methods 
for detecting spatial clusters include modified Knox test 
(Baker, 2004), point process method (Diggle et al., 2005), 
Bayesian hierarchical method (Liang ct al., 2009), methods 
based on K function (Loh, 2011; Wheeler, 2007). The spa- 
tial scan statistics methodology is shown to be very powerful 
compared to some other spatial cluster detection methods 
(Kulldorff et al., 2003). However, the power of the test is 
reduced when the shape of the true cluster is not circle or 
ellipsoid. It is also less useful when part of the inference 
objective is to identify the shape of the clusters. 

Another commonly used approach is to perform test at 
each location in T>, and use multiple comparison methods 
to overcome the issue of large number of tests. One typi- 
cal multiple comparison method is to control false discovery 
rate (FDR) (Bcnjamini and Hochbcrg, 1995; Storey, 2002). 
This method overcomes the fixed shape problem of the scan 
statistics, since it performs tests at every single individual 
location. However, it ignores the spatial information and the 
detected region is usually not spatially continuous. Recently 
this method has been extended by using an adjusted local 
false discovery rate method (Zhang et al., 2011) for spatial 
clustering detection. 

The multi-resolution cluster detection (MCD) method we 
propose in this paper takes advantage of the spatial infor- 
mation to make the test more powerful, while still main- 
tains the flexibility of detecting irregularly shaped spatial 



clusters. The MCD method is motivated by the scale-space 
inference ideas in functional estimation and image analy- 
sis (Chaudhuri and Marron, 1999, 2000; Lindeberg, 1993, 
1994). Zhang et al. (2007) and Zhang et al. (2008) applied 
scale-space inference method to time series analysis with ap- 
plications in Internet anomaly detection . The MCD method 
uses a similar idea as the scale-space inference by forming 
a test statistic based on multi-scale windows to effectively 
use the spatial information. A novel variability measure is 
used to identify the test threshold, which helps to maintain 
a balance between the sensitivity and specificity. The use of 
the variability measure overcomes the multiple comparison 
issues of the individual test, since it is a global procedure. 
Simulation studies and an application to fMRI data show 
that the MCD method compared favorably to the spatial 
scan statistic method and the multiple testing method based 
on FDR for identifying spatial clusters. 

The remaining part of this paper is organized as follows. 
In Section 2 we describe the methodology and theory, which 
includes detailed derivation of the MCD method for bino- 
mial, Poisson and Normal distributions. The validity of the 
local variability measure is also shown in this section. Sec- 
tion 3 provides simulation studies to compare different ap- 
proaches for Binomial distributions with different alterna- 
tive probability of success and different signal regions. An 
application of the three methods to a real fMRI data is given 
in Section 4, and in Section 5 we discuss possible extensions. 

2. METHODOLOGY 

Let Y(s), s € T>, be a sequence of independent random 
variables from the distribution Feu)- Here s is the location 
in a two dimensional region T>, and 9(s) are the population 
parameters. In the later sections, if s = (k,l), we may use 
Y(s) and Yjy interchangeably. Under the null hypothesis, 
we assume 9(s) = 9q for all s € T>. Under the alternative 
hypothesis, we assume 9{s) — 9 X for all s e C V, and 9 = 
9q otherwise. Our objective is to identify 2?^. We consider 
three examples in this paper: 1) T is Bin(No,p), and 9 = p, 
the probability of success; 2) T is Poisson(X), and 9 = A; 
3) T is N{\i,a^), and 6 = fj.. In the above three examples, 
No and <7q are assumed to be known. 

Our multi-resolution cluster detection (MCD) method 
has two steps: 1) At each location, we form a MCD test 
statistic which is expected to be large in the signal region, 
and small in the background region. 2) Determine a thresh- 
old for the test at each location which maintains a balance 
between specificity and sensitivity of the procedure. In Sec- 
tion 2.1, we describe the MCD test statistics and give de- 
tailed formula to compute the test statistics for the distribu- 
tions mentioned above. In Section 2.2 we introduce the no- 
tion of neighborhood variability and describe a method to 
determine the detection threshold based on neighborhood 
variability. Some theoretical results are derived in Section 
2.3 to justify our method for setting the threshold. In Sec- 
tion 2.4 we briefly discuss the choice of scales. 



2.1 The MCD test statistic 

The test statistic for MCD at each location is formed us- 
ing ideas from scale-space inference. In this subsection, let 
us focus on a specific location s. For easy presentation, let T> 
be a two dimensional grid with n r rows and n c columns, and 
s = with i = 1, 2, . . . , n r , and j = 1,2,..., n c . Define a 

sequence of local regions {s} = T>\ C T> 2 C • • • C V m C V. 
Note that our method does not require T> to be square, 
and the shapes of T>^ can be irregular as well. Let us de- 
fine the following aggregation vector {X}pX^, ■ ■ ■ ,X^) T , 
where = Yy, and 

(1) XV j = J2 Y kU for all 1 <r<M. 

kiev r 

In this section, since we focus on a single location, we 
further simplify our notation Xy to be X r . Thus, the 
aggregation vector is (Xi,X 2 , - ■ ■ ,Xm) T - Assuming that 
(xi , x 2 , ■ ■ ■ , xm) t is the observed vector, the likelihood ratio 
test statistics can be written as 

= sup 6o L(9; x) 

sup e L(9; x) 
_ supP(Xi = xi, X 2 = x 2 , ■ ■ ■ ,X M = xm\Qq) 

supP(Ai =xi,X 2 =x 2 ,--- ,X M = x M \9) ' 

Based on standard theory on likelihood ratio test, under 
appropriate regularity conditions, we have 

-21og(A)^ X 2 (A^)- 

Since T>^ is unknown, it is difficult to compute the de- 
nominator of the likelihood ratio statistic under the original 
alternative hypothesis. Instead we derive the formula for A 
under the alternative that the parameter 9 is a constant 
within each of the regions T>\, T> 2 — T>\, ■ • • , and T>k — ~Dk-x 
for three commonly used distributions, Binomial, Poisson 
and Normal, and use them as the test statistics. The results 
does not depend on the shapes of T> r . Thus, the user can use 
their favorite shape in the MCD method. We provide a func- 
tion in R to calculate the test statistic, which provides two 
regular shapes (square and circle) in the implementation. 
In this paper, we will use the square shape for illustration 
purpose. 

All the calculation next uses the following fact: 

P(Xi = x\, ■ ■ ■ , X M = x M ) 
= P(Xi = X\ , X 2 — X\ =x 2 —X\,--- , 
Xm — Xm-i = xm — xm-\) 

Note that since Yij are independent, the {Xi — Xi-i} are in- 
dependent to each other. Using this formula, it is relatively 
straightforward to derive the results for three different dis- 
tributions. In what follows we further introduce a notation 
m r = \T> r \, the cardinality of V r . We leave the details of the 
computation to the appendix. 
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2.1.1 Binomial distribution 



Let Yij ~ Bin(Nij,po) under the null hypothesis. Under 

the alternative, Yij ~ Bin(Nij,pk) for all G T> k \D k -i, 
with pk > po, and Bin(Nij,po) otherwise. It is relatively 
easy to get that, under the null hypothesis, 



X r 



kiev 



Y kl ~ Bin{N r ,p ), 



where N r 



kiev, 



Nf-i- It is straightforward that N t < 



iVj+i. Let pij — j^jprj (note that this adjustment is to avoid 
and 1 as an estimate, which is similar to the Wilson's 
estimate). Define the follows: 

Po = Medianx>(j5ij), 

pi = max(Median Cl (py),po), 

p r = max(Median I>r \ x , i ._ 1 (p i: ,),po)- 

Here the maximization is to make sure the estimate under 
the alternative is not smaller than the estimate under the 
null hypothesis, and we use median instead of mean for ro- 
bustness. Then we have the following formulae to compute 
the approximated likelihood ratio test statistic for Binomial 
distribution: 



-21og(A) = -2 log 



( su Pe„ L{0;x) 



\ supg, L(9; x) 
« -2 [xi (log(po) ~ log(pi)) + (iVi - xx) (log(l - po) 
- log(l - pi)) + (x 2 - xi) x (log(po) - log(p 2 ))+ 
((N 2 - Ni) - (x 2 - zi))(log(l - p ) - log(l -pa)) 

H h {x k - x fe _i)(log(p ) - log(pfc)) 

+ ((N k -N k -i) - (x k -x k -i)) 
x(log(l - po) - log(l - Pk))} ■ 

2.1.2 Poisson distribution 

Let Yij ~ -P(Ao) under the null hypothesis, and Yij ~ 
P(Afe) for (i,j) G T>k\T>k-i, with Afc > Ao- It is clear that, 
under the null hypothesis, 

~ P(m fc Ao). 

Let Ao = Median?? (Y^), and 



A; 



mi_i 



,A , 



then we have the following formulae to compute approxi- 
mate likelihood ratio test statistic for Poisson distribution: 



-2 log(A) » -2[a: 1 (log(Ao) - log(Ai)) + (Ai - A ) 

+ (x 2 - cci)(log(Ao) - log(A 2 )) + m 2 (A 2 - Ao)] 
= -2[(x k - x fe _i)(log(A ) - log(A fc )) 
-I- m k (X k - A )] 

2.1.3 Normal distribution 

We assume that under the null hypothesis, Y.^ ~ 
N(/i,a 2 ) and they are independent. Under the alternative, 
Yij ~ N(fi k ,a 2 ) when (i,j) G V k \D k ^ u and p fe > p. 

Here we assume that cr 2 is given. For real application, 
(T 2 is unknown, thus, we will use a robust estimate for the 
variance. 

Let po = Median-p (Yij), and 



We have 



-21og(A) 



&i — 1 

Pi = max I , po 

TOi - 771,-1 



2xi(pi - po) Mo - Pi 



+ 



2(z 2 - xi)(p 2 - go) ^2(Pp - pj) 

2 2 

2(a; fc - x k -i)(p k - po) 



2.2 Test threshold 

The likelihood ratio test theory suggests that % 2 is the 
approximated distribution of the test statistic, and one way 
to determine the test threshold is to control for FDR based 
on this asymptotic distribution. However, such procedure 
turns out to have poor performance due to the fact that 1) 
the asymptotic distribution may not be a good approxima- 
tion for small sample sizes; 2) the test statistics at different 
locations are correlated. 

In this section, we introduce a neighborhood variability 
measure to determine the test threshold. Let be the 

location to calculate the local variability. We choose the four 
nearest cells, where the cell serves as the center in the 
region, as shown in Figure 1. The variance of these five data 
points is defined as the neighborhood variability. 

Our assumption is that both signal region and the noise 
region have homogenous distribution within themselves. 
Thus, the local variability within these two regions should be 
small, while the local variability for points close to the clus- 
ter boundary should be relatively large. In order to identify 
the "optimal" detection threshold, we compute the average 
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Figure 1. The diagram for the neighborhood variability. At 
each location the variance of the four nearest cells plus 

the point will be calculated as the local variability. This 
helps to identify the edge of the detection boundary. 



where its local five observations (including the point itself) 
are not from the same group. See Figure 2 for illustrations 
of some boundary points. 
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neighborhood variability on the region with test statistics 
between two consecutive detection thresholds for a sequence 
of monotone increasing test thresholds ti, . . . , i&f. When 
the threshold is too small, most of the signal regions are 
identified as signal, along with many locations in the noise 
region. As the threshold increases, only the noise region are 
excluded, leading to a small average neighborhood variabil- 
ity. Similarly when the threshold is too large, the average 
neighborhood variability between two consecutive detection 
regions will also be small as they only include signal regions. 
We identify the two thresholds between which the average 
neighborhood variability is the highest and take the average 
of these two as the optimal detection threshold. 

To summarize, our MCD method can be carried out as 
follows: 

1. At each location s, compute the MCD test statistic 
T(s) at this location as described in Section 2.1 and the 
neighborhood variability measure V(s) as described in 
Section 2.2. 

2. Let t\ — min s T(s), tM = max s T(s), and define an 
arithmetic sequence t\, t%, ■ ■ ■ , ijw- 

3. Let V k = {s : T(s) > t k }, and V k = avejT(s) : s G 
2?/c,s (f. T>k+i}- Let k* = argmax fc Vfc. The optimal test 
threshold is defined as t* = 0.5(tfc* + tfc»+i), and the 
final detected region is defined as T>a — {s : T(s) > t*}. 

2.3 Validity of the test threshold 

In this subsection, we illustrate that under certain condi- 
tions, the MCD method can identify the signal region almost 
surely. Let B be the set of the boundary points, N be the 
set of inner points in the noise region, and S be the set of 
inner points in the signal region. For simplicity, we also use 
B c = AfUS. Furthermore, let rib be the number of boundary 
points, n s be the number of (non-boundary) signal points, 
and n n be the number of (non-boundary) noise points. Let 
N = rib + n n + n s be the total number of data points in the 
spatial region T>. A boundary point is defined as the point 



Figure 2. The diagram for four types of boundary points. The 
white cells are noise points, and the gray cells are signal points. 
Panel (a) shows a boundary point at the noise region with an 
angle; panel (b) shows a boundary point at the noise region 
without an angle; panel (c) shows a boundary point at the signal 
region with an angle; and panel (d) shows a boundary point at the 
signal region without an angle. 

For easy presentation, all the properties are shown un- 
der Normality assumption. We assume for any Y in the 
noise region, Y ~ N(0, 1); and for any Y in the signal re- 
gion, Y ~ N(6, 1), where S > 0. The parameter 8 measures 
the signal strength. In addition, the proportion of boundary 
point rib/N is denoted as pb- The following two theorems 
give some justification for the procedure we outlined in Sec- 
tion 2.2. The proofs of both theorems are in the appendix. 

Theorem 2.1. The average test statistic T at Af, B and S 

has the following relation: if N,nt,,n n ,n s — ¥ oo, and 5 > 0, 



1. 



P Ave T(Y) < AveT(r) < AveT(F) 
\Ye7V YeB Yes 



Theorem 2.2. // 



8 2 Np B (l~p B ) 
Pb) 



then 



P AveV(r) - Ave V(Y) > 
VyeB YeB" 



Remark: Theorem 2.1 shows that the MCD test statistic at 
points in the boundary region are on average smaller than 
those in the purely signal region and larger than those in 
the noise region. Thus which an appropriate sequence of 
test thresholds a substantial part of the boundary region 
can be in one of the belts defined by {s : s € s £ T>k+i}- 
Theorem 2.2 shows that the average variability measure on 
the boundary set is the largest if either the signal strength 
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Figure 3. The simulation settings 

is large or the points in both boundary and non-boundary 
regions are large. Thus by selecting the belt with the maxi- 
mum average variability, we have a good chance of identify- 
ing the true boundary of the signal region. 

2.4 Number of Scales 

To use the MCD method, one need to determine: 1) how 
many scales to be used, and 2) what is the largest scale to 
be used. The answer to these two questions are related to 
each other and depends on specific applications. Typically, 
when the maximum scale is the same, the test power of the 
MCD method with more scales tends to be larger, as shown 
in one dimensional situation in Zhang et al. (2007). For the 
two dimensional case, we compared two-scale vs. five-scale 
and single-scale methods and found that even though the 
test power of the five-scale method is better than the other 
two methods, the improvement from two to five is not as 
significant as those from one to two. Thus, in this paper, we 
will only use a two-scale MCD method for all the simulation 
examples and real application. 

By aggregating local values, the MCD method will have 
some dilution effect, i.e., the noise points which are close 
to the signal region are more likely to be identified as false 
signals. The larger the aggregation region, the higher chance 
of false detection for these points. Thus, we will not use 
a very large window as the maximum scale. For a typical 
100 x 100 image, we find that the largest aggregation scale 
around 10 (or a radius as 5) works well. In the following 
simulation and real examples, we will use window size 10 as 
our maximum scale. 
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Figure 4. The probability map of identification from the MCD 
method by different alternatives 

3. SIMULATION AND COMPARISON 

In this section, we use simulation to compare the MCD 
method with the single scale test method controlling FDR 
and the spatial scan statistics. All the examples are simu- 
lated in a 100 x 100 map. To save space, in this manuscript, 
we only simulate binomial distribution. Poisson and Normal 
examples are provided in the author's personal website. We 
use sensitivity and specificity to compare the performance 
of different methods. Under each simulation setting, we will 
repeat the same simulation 100 times, and also report the 
probability of identification at each spatial location. 

3.1 Simulation Settings 

Figure 3 shows the shapes of the simulations in this pa- 
per. We will study four different shapes of regions: L-shape, 
triangle, oval and F-shape. Note that the triangle and oval 
are convex, while Y and L-shapes are not. Note that a data 
image contains 10,000 pixels. Table 1 provides the number 
and the percentage of the pixels in the simulated signal re- 
gions. 



Table 1. Total true number of signal pixels 



Shapes 


L-shape 


Oval 


Triangle 


y-shape 


Number of pixels 


400 


1142 


864 


1344 


(percentage) 


4% 


11.42% 


8.64% 


13.44% 



For each simulation run, the background of the image is 
simulated as independent identically distributed Binomial 
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Figure 5. The probability map of identification from the 
individual tests adjusted by false discovery rate by different 
alternatives 

distribution Y(a) ~ 5m(100, 0.2), if s € V/V s . The signal 
part, Y(s) — Bin(100,pi), if s e V s , where pi > 0.2. In 
this paper, we focus on five different alternative probabili- 
ties of success, pi = 0.21, 0.22, 0.23, 0.24, and 0.25. Note 
that when pi = 0.21, the detection should be very challeng- 
ing. We also assume that the population is preknown. For 
real applications such as epidemiology, the population of a 
specific geographical location can be approximated by a re- 
cent survey. So this assumption is reasonable even for real 
applications. 

3.2 Simulation Results 

For each simulation setting, i.e., the same signal shape 
and the same alternative probability of success, we repeat 
the simulation 100 times. We use the average of several mea- 
sures to compare the performance of our method with other 
related methods. The measurements include specificity (the 
percentage of correctly identification of noise points), sen- 
sitivity (the percentage of correctly identification of signal 
points) , and the probability (relative frequency) of identifi- 
cations at each pixels. 

Tables 2-4 provides the average specificities and the 
average sensitivities for these different simulation settings. 
The numbers in the parenthesis provides the standard devi- 
ation of the performance measures. It shows that when the 
alternative probability increases, both sensitivity and speci- 
ficity of the MCD method increases. The variability also 
decreases as the alternative probability increases. Note that 
even when the alternative probability of success is 0.21, for 



Figure 6. The probability map of identification from the 
spatial scan method by different alternatives 

all the shapes, our MCD method has an average sensitivity 
of 40%, which is quite impressive, although the variability 
is large. Note that when the alternative probability of suc- 
cess increases to 0.22, the specificity increases dramatically 
to over 90%, meanwhile, the sensitivity also increases. 

Table 3 shows the specificity and sensitivity of the single 
scale method with test threshold selected to control false 
discovery rate. We use the direct false discovery rate method 
in Storey (2002). In order to have a sensible detection result, 
we use FDR level as 0.60, which is relatively high. Lower 
FDR level will lead to much smaller sensitivity. It shows that 
the specificity will decrease when the sensitivity increases. 
When the alternative probability is 0.22, the specificity of 
this method is similar to that of the MCD method, but the 
sensitivity is much smaller, which shows better performance 
of the MCD method over the single scale method. 

Table 4 shows the specificity and sensitivity for the Spa- 
tial Scan method. It shows that when the data is oval or 
triangle, the performance is among the best of these three 
methods, even when the signal strength is 0.21. But for 
L-shape and l"-shape, the sensitivity is comparable to the 
MCD method. When the signal strength increases, the MCD 
method performs similarly as the spatial scan method, even 
for the two convex shapes. The MCD method simultane- 
ously have better specificity and sensitivity than the Spatial 
Scan method on the other two complicated shapes. 

Figures 4 -6 provide the probability map of the pixels 
being detected by each of these three methods. The proba- 
bility is calculated by the relatively frequency of each pixels 
being detected by each method under the 100 runs under 
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Table 2. The average specificity/sensitivity for the MCD method by different simulation settings 





L-shape 


Oval shape 


Triangle shape 


Y- 


^hape 


PI 


ojJfcJCiiiciLy 


OciibiLi viiy 


OjJfcJClllL-lLy 


OcIlOlLlVlLy 


OJJcClllClLy 


Dcllnlll Vlly 


Specificity 


Sensitivity 




(Std) 


(Std) 


(Std) 


(Std) 


(Std) 


(Std) 


(Std) 


(Std) 


0.21 


0.8415 


0.3818 


0.8462 


0.3972 


0.8273 


0.4036 


0.8072 


0.406 




(0.2899) 


(0.3081) 


(0.2865) 


(0.3353) 


(0.3142) 


(0.323) 


(0.3193) 


(0.3558) 


0.22 


0.9401 


0.6252 


0.9309 


0.5125 


0.939 


0.5806 


0.9367 


0.5299 




(0.1516) 


(0.2574) 


(0.1621) 


(0.3865) 


(0.1459) 


(0.3533) 


(0.1436) 


(0.3832) 


0.23 


0.9845 


0.7986 


0.9738 


0.7669 


0.9801 


0.8079 


0.9626 


0.8232 




(0.0327) 


(0.2358) 


(0.0315) 


(0.3744) 


(0.0172) 


(0.3159) 


(0.0237) 


(0.3313) 


0.24 


0.987 


0.9387 


0.9769 


0.9003 


0.9774 


0.9455 


0.9484 


0.98 




(0.0043) 


(0.0999) 


(0.0108) 


(0.2776) 


(0.0079) 


(0.1827) 


(0.0963) 


(0.0911) 


0.25 


0.9856 


0.9723 


0.9745 


0.9817 


0.9759 


0.9923 


0.96 


0.9588 




(0.0032) 


(0.0198) 


(0.0061) 


(0.1261) 


(0.0043) 


(0.0069) 


(0.0118) 


(0.1309) 



Table 3. The average specificity/sensitivity for single scale detection method by different simulation settings 
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(0.0359) 


0.22 


0.8726 


0.2613 


0.9428 


0.1382 


0.9027 


0.212 


0.8532 


0.2904 




(0.0333) 


(0.0536) 


(0.0218) 


(0.046) 


(0.0266) 


(0.0458) 


(0.0348) 


(0.0539) 


0.23 


0.8167 


0.4322 


0.9305 


0.2277 


0.8653 


0.3556 
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0.4854 




(0.0427) 


(0.0621) 


(0.0265) 


(0.0612) 


(0.0323) 


(0.0566) 


(0.0459) 


(0.0614) 


0.24 


0.7632 


0.5976 


0.9126 


0.353 


0.826 


0.5122 
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0.6568 




(0.0451) 


(0.0566) 


(0.0222) 


(0.0593) 


(0.0386) 


(0.0598) 


(0.0597) 


(0.0772) 


0.25 


0.7208 


0.7272 


0.8989 


0.4755 


0.7954 


0.6464 


0.7342 


0.6584 




(0.0434) 


(0.0437) 


(0.0238) 


(0.0551) 


(0.0367) 


(0.0462) 


(0.1309) 


(0.2286) 



each simulation setting. All figures shows that when the sig- 
nal is weak, the chance of the signal region being detected 
is still much higher than the noise region. 

Figure 4 provides the probability map by the MCD 
method. It shows that the actually shapes are almost cor- 
rectly identified. There is a slight dilution effect by show- 
ing oval shapes on the corners. But overall the complicated 
shapes such as L and Y are correctly identified. 

Figure 5 provides the probability map of the pixels being 
detected by the single scale method after adjusting for the 
false discovery rate. On average, the shapes are correctly 
identified. But this figure shows that the probability of cor- 
rectly identified is relatively small. Even with a strong signal 
of 0.25, the probability of identification is about 60%. 

Figure 6 provides a similar probability map of the pix- 
els being detected by the Spatial Scan algorithm. We use 
the default circles as the scan window. It shows that for the 
oval shape signal, even when the signal strength is low, the 
Spatial Scan algorithm provides excellent performance. For 
other shapes, the performance is not bad in terms of the sen- 
sitivity. But it clearly misidentifies the shapes for triangle, 
L and F-shapes. 



3.3 Comparisons between different numbers 
of scales 

Two-scale MCD method has been used in the previous 
subsection. In this subsection, we compare it with a five- 
scale MCD method, where the maximum scanning windows 
of these two methods are the same. In addition, we compare 
with another single scale method: the original scale tests 
using the local variability measure adjusted threshold. Here 
we only consider a single setting with the L-shape, with the 
alternative probability of success as 0.22. Figure 7 shows the 
detection probability at each pixels. 

It shows that the signal scale method plus the local vari- 
ability measure adjustment is not powerful. In the signal re- 
gion, it only has 20% chance being detected. The two-scale 
MCD method with maximum scan window as 5 increases the 
detection probability to 80%. The five-scale MCD method 
with the same maximum scan window is very similar to 
the two-scale one. We also compared the ROC curves be- 
tween 2-scale and 5-scale. It turns out that on average the 
5-scale ROC has larger area under the curve than the 2- 
scale method, but they are very close to each other. See one 
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Table 4. The average specificity/sensitivity for the spatial scan method by different simulation settings 
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0.8342 
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Figure 7. Comparison of the detection probability for single scale, 
two-scale, and five-scale MCD methods, under the setting of 

Bin(W0, .2) vs. Bin(W0, .22) 

example ROC curve comparions in Figure 8. We note that 
the maximum scanning window is more important than the 
number of scales. 

4. EXAMPLES: FUNCTIONAL MRI DATA 

In this section we apply the three aforementioned meth- 
ods to a real fMRI data to illustrate their differences in 
real data application. The fMRI data was firstly analyzed 
in Maitra (2009, 2010); Maitra et al. (2002). 

The top left panel in Figure 9 shows 9 slices of the fMRI 
images in a total of 22 slices. Each individual image has 
128x128 pixels. All 9 images show activities in different re- 



Figure 8. The ROC curves for one specific simulation example. 
The green one is for a two-scale MCD method, and the purple one 
is for a five-scale MCD method 

gions, by showing hot colors (e.g. yellow or red colors which 
are positive values) and cool colors (dark blue shows neg- 
ative values). The response is a transformation of p- values 
from a previous study, and we are only interested in de- 
tection of the positive regions. By visual inspection of the 
raw data, it appears that there are some (positive) activities 
at the following regions: central right (slice 1), central left 
and central right (slice 5), top left (slice 11), bottom cen- 
tral (slice 15), top central (slice 18), central middle or top 
central (slice 20), and central part (slice 22). The shapes of 
these regions seem to be very irregular. 

The bottom left panel shows the single scale detection 
method with controlling false discovery rate (FDR) (20% 
false discovery rate). They highlight the regions that have 



8 Zhang and Zhu 



(a) Raw Data 



(b) Spatial Scan method 




Figure 9. The detection result for an fMRI data set. The selected 9 slices of image are shown in panel (a). Panel (b) shows 
the detection result by the spatial scan method. It identify some interesting regions, but their shapes are regular. Besides, 
there are some tiny regions outside the big dots, which may correspond to false discovery. Panel (c) shows marginal detection 
method with FRD control level at 0.20. It also highlights some regions in each plot, but these identified spots do not naturally 
form (continuous) region. Panel (d) provided the detection result by the MCD method. It highlights several irregular shaped 

regions in each image. 
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been mentioned in the previous paragraph, but there are also 
many more outside spots, which is likely to be false positives. 
In addition, these candidate regions do not naturally form 
clusters in the figures, which makes it hard to interpret. 

The top right panel shows the analysis results by the 
spatial Scan method. The method shows several important 
clusters in the plot, such as the clusters mentioned in pre- 
vious paragraph. However, these regions are identified as a 
round shape as suggested by the spatial scan method, and 
do not reveal any useful shape information which is impor- 
tant for the analysis of fMRI data. 

The bottom right panel shows the results generated by 
the MCD method. Within each slice, we identified several 
clusters, which are irregularly shaped. These regions form 
natural clusters, and also validate the initial visual impres- 
sion. These results show that our method is suitable for 
detection of irregular shaped spatial signals. 

5. DISCUSSION 

In this paper, we proposed a multiresolution cluster de- 
tection method, which is shown to have better performance 
empirically compared to single scale testing methods con- 
trolling for FDR and spatial scan statistics for detecting 
spatial clusters with irregular shape. We conjecture that the 
power of MCD method is larger than methods based on sin- 
gle scale (e.g. the individual test combined with multiple 
comparison), the theoretical proof of which will be a future 
research topic. An early study of multiresolution method 
in one dimensional situation (Zhang et al., 2007) provides a 
foundation for this type of research. Another issue that is 
worth investigating is the selection of number of scales and 
the largest scale for the MCD method, which we did not 
fully address in this paper. 
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under the null is 
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The maximum likelihood for the entire parameter space is 
quite challenging to be calculated. We may further introduce 
the following assumptions to simplify the calculation: For 
each region among the following k: T>i, T> 2 — 2?i, • • • , and 
T>k — T>k-\\ we assume that the probability of successes are 
the same. Thus, we could have the following estimate of the 
probability of successes: 



APPENDIX: DERIVATION OF THE 
PROPOSITIONS 

The test statistics can be derived relatively straightfor- 
ward as the follows: 

Derivation of the test statistic under 
Binomial assumption 

The likelihood function can be calculate by using the fol- 
lowing relation: 



PiX 1 = x u X 2 = x 2 
(2) =P(X 1 =x u X 2 - 

X k - x {k - v 



X 1 
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Note that {X 1 — X^ 1 are independent to each other. 

Under the null hypothesis, X 1 - A^ 1 ) - Bin^N 1 - 
A^ I_1 ),po)- The distribution of the alternative is much more 
complicated than this. Note that typically po and p\ are un- 
known. We will use the following methods to estimate po- 
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p r = max( Median (p^ ),po), Vr > 2 . 
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Based on this, the likelihood for the entire parameter space 
is 
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Thus, the likelihood ratio will be 
su Pe L(0;x) 
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Thus, the typical likelihood ratio statistic —2 log A can be 
easily calculated, which leads to the proposition. 

Derivation of the test statistic under Poisson 
assumption 

We will use the same calculation formula in (2) For Pois- 
son case, under the null hypothesis, X 1 — X l ~ 1 ~ P((|27^| — 
|2?i_i|)Ao) for all % > 2. We can define Ao = Median© (Xy). 
Under the alternative or the entire parameter space, the 
distribution may be the follows, X 1 - X 1 ' 1 ~ P((|X>i| - 
|X>i_i|)Ai), and 

Ai=max( |2? i |-|2? i _ 1 |' Ao) - 
Let m£ = T> k ~ fc-i, we have, under Hq, 

P{X^ = x x ,X 2 =x 2 ,--- ,X k = x k ) 

= P{X X = xi . X 2 - X 1 = x 2 -xi,-- - , 
X k_ x k-i =Xk _ Xk _ 1 \ 



This leads to the proposition 

Derivation of the test statistic under Normal 
assumption 

Under the null hypothesis, we know that X 1 — A 1-1 ~ 
N(miHo,m,ia 2 ) for all i > 2. We can define juo = 
Median-p (Ay ) . 

Under the alternative or the entire parameter space, 
the distribution may be the follows, X' 1 — X 1 ^ 1 ~ 
N(mi/j,i,mia 2 ), and 
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This leads to 



A = cxp [(xi - £i) 2 - (zi - /2 ) 2 ] 

+ o [{{X2 ~ ^l) - TO 2 ^2) 2 - ((X2 - ^l) - TO2/I0) 2 ] 

2TO2C 

H 1- n 1 J ((fffc - Zfc-l) - rrikfik) 2 
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+ m lCfil - Mo)]} • 
This leads to the proposition 

Derivation of Theorem 2.1 

Let Y be the measurement at the specific location, Yi, Y 2 , 
Y 3 , and Y4 are the four measurements collected in T>\. Note 
that Yi ~ iV(/x, 1). If li is in the noise region, Yi ~ iV(0, 1), 
and if Y* is in the signal region, Yi ~ iV"(J, 1). 

Use the same notation, we have X\ = Yq, and A" 2 = 
J2i=o^i- The test statistic is defined as 

-2 log A = 2Xi(tii - Mo) + (Mo - Ml) 

+ 2(AT 2 - Xi)(/X2 - Mo) + ™ 2 (Mo _ M 2 ) 

It is straightforward to have 



x N-l x 4 

^° = jV ^ 7l ' Mi=*o, M2 = ^X! yi ' 

i=0 i=l 



and m 2 = 4. 

Note that /io ~ N(n s 5/N, 1/N). Similarly, we can derive 
the distribution of fii and /i 2 as follows. 

1. If all Yq, Yi, • ■ ■ , Y4 are in the noise region, fi\ ~ Af(0, 1), 
and /I2 ~ N(0,l/4). 

2. If Yo is a noise point, but at the boundary as shown 
in Figure 2 (a) and (b), then fix ~ A^(0,1), and /2 2 ~ 
N(28/±, 1/4) (case (a)) or £ 2 ~ iV(<J/4, 1/4) (case (b)). 

3. If Yq is a signal point, but at the boundary as shown 
in Figure 2 (c) and (d), then fix ~ N(8,l), and /2 2 ~ 
N(28/4, 1/4) (case (c)) or ju 2 ~ N(3S/A, 1/4) (case (d)). 

4. If Yq, Yi, Y 4 are all in the signal region, jui ~ 
Ar(5,l), and £ 2 - N(6,l/4). 



Note that the test statistic, T can be written as 

T=2Fo ( yo "^S ri ) + ((^S Fi ) 2 " F ° 2 

4 / 4 JV-1 \ 

i=l \ i=l i=0 / 



For each individual T, it can be shown that when N — > 
00, we have 

T A- j2f + 4/I 2 

By using the above consideration on whether Yq is in 
the noise region, boundary, and signal region, we have the 
following results 

1. When all Yo, Yi, • • • , Y4 are in the noise region, jl\ ~ 

X 2 (0), (2£ 2 ) 2 ~ xf(0), and thus T 4 xi(0). 

2. When Yo is in noise region, and as shown in Figure 2 (a), 
Ml ~ X?(0), 2/2 2 ~ JV(<y, 1), and thus (2m 2 ) 2 ~ xW)- 
This leads to t4 xl{5 2 )- 

3. When Yo is in noise region, and as shown in Figure 2 
(b), fi\ ~ xi(0), 2/i 2 ~ N(5/2,l), and thus (2£ 2 ) 2 ~ 

X?(5 2 /4). This leads to T 4 x 2 2 {5 2 /A). 

4. When Yo is in signal region, and as shown in Figure 
2 (c), ~ xl(S 2 ), 2t 2 ~ JV(<5, 1), and thus (2/2 2 ) 2 ~ 

X?(£ 2 )- This leads to T 4 xl(2<5 2 ). 

5. When Yo is in signal region, and as shown in Figure 2 
(d), Mi ^ xW), 2£ 2 - W(3*/2, 1), and thus (2£ 2 ) 2 ~ 
X 2 (95 2 /4). This leads to T 4 xl(13(5 2 /4). 

6. When Yq, Yi, Y 4 are in the signal region, fi 2 ~ 
X?(<5 2 ), 2 M2 ~ iV(2*,l), and thus (2/2 2 ) 2 ~ xi(^ 2 ). 

This leads to T 4 xli^ 2 ). 

Note that /ii and /x 2 are independent to each other. Thus, 
we will have following relations: 

!2 Yo is an inner point in the noise region 
f{8) Yq is an boundary point , 
g (5) Yq is an inner point in the signal region 

where 2 + 5 2 /4 < /(<*) < 2 + 13<5 2 /4, and g(S) = 2 + 5S 2 . It 
is straightforward that, if 5 > 0, 



2 < f(5) < g(S). 



Similarly, we have 



4 Yo is an inner point in the noise region 
V(T) = { vi(S) Yq is an boundary point 

V2 (5) Yq is an inner point in the signal region 
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where 2(2 + S 2 /2) < Vl (S) < 2(2 + 13<5 2 /2), and v 2 (S) = 
2(2 + 10^). 

Let us assume that we have n n inner points in the noise 
region, n b boundary points, and n s inner points in the signal 
region. The average T(yo) will have 



where 



— £ t(y ) An (2 



4 

n„ 



- ]T T(Y )AN(f(8) 

n b V 



V GB 



- J2 T (Yo) 4 N (g{5), 



Y es 



n b 
v 2 (S) 



Thus, if n n , n b , and n s — > 00, we will have stochastically 

p ( v E r(Fo) < i E T(Fo) < r £ 1 

Derivation of Theorem 2.2 

Let us use similar notation as above. Let B be the bound- 
ary point, which is defined as at least one of its neighbor 4 
points is from another group. B c be non-boundary point. 
The local variability is defined as \ J2i=oO^ ~ Y) 2 . Let us 
focus on V(Y) — X)i=oC^ ~ Y) 2 . It is straightforward to 
show that if Y G B c , V(Y) ~ x|(0). Similarly as derivation 
in the former section, let Yi ~ N(fii, 1). If Yb € <B, we have 
S ^ 2 ~ xl(2 M?)- It can be shown that /if = fc5 2 , where fc 
is the number of signal points in the local region (note that, 
k = 1, 2, 3, 4). Let V = -| 53i=o ^> ^ * s straightforward that 
Y ~ N(kS/5, 1/5), or V5Y - N(k5/V5, 1). And thus 



where 



V(Y) ~ X 2 (C), 



C = M 2 - fc 2 <5 2 /5, fc = 1,2,3,4. 



Then, E{V{Y)) = 4 + C, and Var(F(Y)) = 8 + 4C, if Y € B. 
Similarly, E{V{Y)) = 4, and Var(V(Y)) = 8, when Y G 6 C . 
By the central limit theorem, we have 

8 + 4C 
n b 



Ave F(Y) 4JV 4 + C, 
ygB 



Ave V(Y) A AT I 4, ■ 



n b 



And thus 

Ave V(Y) - Ave V(Y) An(c, ^±^L + — ? — 



If we want to P(Ave Y eB V(Y) - Ave yeS c V(Y) > 0) 1, 
we need to have 



I CNp B (l- PB ) 
8 + 4(l-p fl ) ' 



Here ps be the proportion of boundary point in the im- 
age. So if S I f . p A fS 1 ~ p f ^ —> 00, then a — > 00. This leads 

to P(Avcy eB \/(r) - Ave Yee ci/(y") > 0) -> 1. And then 
P(Ave Yee Y(Y) - Ave yee e\/(y") > 0) -> 1. 



$(a) 1. 
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